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Abstract:  Nowadays  IT  companies  is  spending  more  than  40 
percent  of  their  cost  in  fixing  software  bugs,  traditionally  these 
bugs  are  fixed  by  manual  assignment  to  a particular  developer 
, this  approach  causes  too  much  dependency,  the  new  and 
alternative  approach  is  the  bug  triage  system  which  fix  the  bug 
automatically  , which  automatically  assign  the  reported  bug  to 
a develop  which  decreases  the  time  and  cost  in  in  manual 
work,  different  classification  techniques  are  used  to  conduct 
automatic  bug  triage.  In  this  paper,  we  propose  to  apply 
machine  learning  techniques  to  assist  in  bug  triage  to  predict 
which  developer  should  be  assigned  on  the  bug  based  on  its 
description  by  applying  text  categorization.  We  will  address 
the  problem  of  data  reduction  for  bug  triage,  i.e.  how  the 
quality  of  bug  data  would  be  improved. 

Keyword:  Bug  triage,  text  categorization,  bug  data,  data 
reduction. 


1.  INTRODUCTION 

Automated  Bug  tracking  system  has  its  significance  in  large 
software  development  projects  which  manages  bug  reports  and 
list  of  developers  who  work  on  fixing  them.  Bug  tracking 
systems  has  its  importance  in  open  source  software 
development,  where  the  team  members  can  be  dispersed 
around  the  world.  In  such  distributed  projects,  the  developers 
and  other  contributors  may  rarely  see  each  other.  Secondly,  the 
bug  tracking  system  is  used  not  only  to  keep  track  of  problem 
reports  and  feature  requests,  but  it  also  help  in  coordinating  the 
work  among  the  different  developers. 

Software  bugs  can  never  be  avoidable  and  at  the 
same  time  fixing  bugs  is  expensive  in  software  development. 
IT  companies  spend  more  than  40  percent  of  their  cost  in 
fixing  software  bugs.  Large  software  projects  deploy  bug 
repositories  to  support  information  collection  and  to  assist 
developers  to  handle  bugs.  A bug  repository  will  contain  all 
report  which  plays  an  important  role  in  managing  software 
bugs.  In  a bug  repository,  a bug  is  maintained  as  a bug  report, 
which  records  the  complete  description  of  reproducing  the  bug 
and  updates  according  to  the  status  of  bug  fixing.  A bug 


repository  provides  a data  platform  to  support  many  types  of 
tasks  on  bugs,  e.g.,  fault  prediction,  bug  localization,  and 
reopened  bug  analysis. 

The  bug  reports  in  a bug  repository  are  called  bug 
data.  This  bug  report  is  then  assigns  to  a developer,  who  starts 
to  fix  the  bug.  If  the  assigned  developer  is  unable  fix  the  bug, 
the  bug  is  migrated  to  another  developer.  The  process  of 
assigning  a bug  report  to  an  appropriate  developer  is  called 
bug  triage.  As  the  Work  of  Bug  triage  system  is  to  choose  the 
appropriate  developer  for  fixing  bugs,  we  will  follow  the 
existing  work  to  remove  unfixed  bug  reports. 


Cubrani  and  Murphy  First  proposed  the  problem  of  automatic 
bug  triage.  The  Machine  Learning  technique,  Text 
Categorization  is  applied  to  assist  in  bug  triage  by  using  text 
categorization  [1].  Text  categorization  is  also  known  as  text 
classification  which  is  a technique  of  automatically  sorting  a 
set  of  documents  into  categories  from  a predefined  set  where  a 
developer  gets  predicted  using  the  bug’s  description  [2].  For 
this  they  used  supervised  machine  learning  technique  using 
Naive  Bayes  classifier  to  predict  the  correct  developer. 

Xuan  present  a semi- supervised  approach  for 
automatic  bug  triage  using  text  classification  [3].  Their 
approach  combined  the  naive  Bayes  classification  approach 
and  expectation  maximization  to  take  the  advantage  of  both 
labelled  and  unlabelled  bug  reports.  Xuan  trains  a classifier 
with  a fraction  of  labelled  bug  reports.  This  approach  labelled 
numerous  unlabelled  bug  reports.  From  the  result  of,  this  semi- 
supervised  approach  improves  the  classification  accuracy  of 
bug  triage  by  up  to  6%  and  it  avoids  low-quality  bugs. 

When  a bug  report  has  been  assigned  to  a specific 
developer,  then  if  the  assigned  developer  is  unable  to  fix  the 
bug,  the  assigned  developers  can  forward  or  reassign  the  bug 
to  other  developer.  This  process  of  reassignment  of  bug  from 
one  developer  to  another  is  called  “Bug  Tossing”.  Jeong  find 
out  that  in  manual  bug  triage,  37  percent  - 44  percent  of  bug 
reports  are  “tossed” [4]. In  addition,  Jeong  et  al.  a model  of  bug 
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tossing  is  built  to  reduces  the  number  of  reassignment  of  bug 
reports.  The  Markov  chains  based  tossing  graph  approach  is 
proposed  to  capture  the  past  bug  tossing  history  which 
improved  the  bug  assignment  and  reduced  unnecessary  tossing 
steps. 

P.  Bhattacharya  proposed  a method  for  bug  triage 
[12].  Goal  of  His  Proposed  system  was  to  find  the  optimal  set 
of  machine  learning  techniques  to  improve  bug  assignment 
accuracy  in  large  projects.  [5]  Used  a set  of  machine  learning 
tools  and  a probabilistic  graph-based  model  (bug  tossing 
graphs)  that  lead  to  highly-accurate  predictions,  and  laid  the 
foundation  for  the  next  generation  of  machine  learning -based 
bug  assignment.  They  used  methodology  like  Choosing 
effective  classifiers  and  features,  Incremental  learning,  Multi 
featured  tossing  graphs  to  achieve  their  goal. 

The  current  technique  of  Bug  Triaging  involves 
modelling  the  reassignment  of  bugs  as  a goal-oriented  path 
model  [6].  V.  Akila.  Proposed  a new  framework  with  the 
additional  capabilities.  This  models  the  reassignment  of  bugs 
as  Enriched  Adaptive  Bug  Triaging  System  (EABTS)  which  is 
based  on  actual  path  model.  Their  graph  structure  captures  the 
relationship  among  developers  as  the  number  of  tosses  and 
also  captures  the  propinquity  exists  among  developers. 
Therefore,  this  graph  structure  is  enriched.  The  technique  was 
based  on  Ant  routing.  Ant  routing  is  inherently  adaptive  in 
nature.  Their  work  gave  a sub  graph  that  consists  of  developers 
who  are  frequently  involved  in  bug  resolution. 

3.  PROPOSED  METHOD 

We  propose  to  apply  machine  learning  techniques  to  assist  in 
bug  triage  to  predict  which  developer  should  be  assigned  on 
the  bug  based  on  its  description  by  applying  text 
categorization,  i.e.  how  the  quality  of  bug  data  would  be 
improved.  Following  are  the  different  scenarios  for  Our 
Proposed  Bug  Tracking  System 

• Scenario  1:  In  first  scenario  the  assigned  developer 
will  resolved  the  report  and  it  will  be  labeled  by  the 
developer's  class  regardless  of  who  has  submit  the 
report  or  the  type  of  resolution. 

• Scenario  2:  In  second  scenario,  the  report  can  be 
resolved  by  someone  other  than  the  assigned 
developer,  but  not  by  the  person  who  submitted  it  or 
nor  by  the  person  directly  assigned,  the  report  will  be 
labeled  with  the  class  of  the  particular  developer  who 
marked  it  resolved.  The  reasoning  is  that  whoever 
resolves  the  report  is  the  person  to  whom  it  should 
have  been  assigned. 

• Scenario  3:  In  this  scenario,  if  the  report  is  resolved 
as  fixed,  Regardless  of  whom  the  resolver  was,  we 
assume  that  this  is  the  developer  who  implemented 
the  fixed  and  label  the  report  with  the  class  of  the 
assigned  developer,  as  He  is  the  person  who  had 


done  the  real  work  on  the  report.  This  rule  covers  the 
frequent  case  where  an  Eclipse  developer  files  a 
report,  which  is  then  assigned  to  somebody  else  or  a 
sub-team  alias  by  default,  and  then  later  implements 
the  fix  himself. 

Scenario  4:  If  the  report  was  resolved  as  non-fixed 
by  the  person  who  submitted  it,  and  who  was  not 
also  assigned  to  it,  the  report  is  class  labeled  with  the 
first  assigned  developer  or  person.  Since  it  might  be 
a feature  or  a bug  which  will  later  be  handled  by 
someone  or  after  being  prompted  by  a developer  for 
details  of  his  or  her  setup  and  discover  that  there  is 
no  bug. 

Scenario  5:  If  the  submitter  resolve  the  report  as 
non-  fixed  who  was  not  the  assigned-to  developer, 
and  nobody  responded,  there  is  an  error  or  the 
submitter  caught  the  mistake  before  anyone  started 
to  solve  the  issue,  these  reports  are  removed  from 
the  training  set,  as  it  cannot  be  reliably  labeled. 
Scenario  6:  If  no  developer  could  resolve  the  report 
then  it  is  marked  as  non-fixed  and  the  class  is 
assigned  by  the  developer  who  was  the  last  person 
who  has  worked  on  the  report. 

4.  CONCLUSIONS 

Bug  triage  is  one  of  the  expensive  step  in  software 
maintenance  for  both  cost  as  well  as  time.  Our  Approach  will 
combine  feature  selection  with  instance  selection  to  reduce  the 
scale  of  bug  data  sets  as  well  as  improve  the  data  quality.  To 
determine  the  order  of  applying  instance  selection  and  feature 
selection  for  a new  bug  data  set,  we  will  extract  the  attributes 
of  each  bug  dataset  and  will  train  a predictive  model  based  on 
historical  bug  datasets.  We  will  empirically  investigate  the 
data  reduction  for  bug  triage  in  bug  repositories  of  two  large 
open  source  projects,  namely  Eclipse  and  Mozilla.  Our 
Proposed  system  will  be  based  on  Random  Forest  which  will 
provide  an  efficient  approach  on  data  processing  to  reduce  the 
scale  and  provide  high-quality  bug  data  in  software 
development  and  maintenance. 
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